Extracting Information from a Domain of Unstructured Data
نویسندگان
چکیده
Information extraction is a process of extracting relevant data in a specified structured format from semi-structured and unstructured data sources. Extracting information from a collection of unstructured documents allowing reasonable range of fault tolerance is a challenging problem. Existing methodology includes statistical training methods that require enormous training time, and which yield trained models biased to erroneous data. To avoid these weaknesses, we propose a similarity maximization methodology that requires a very small amount of human coding and uses an integer-programming (IP) framework to extract appropriate information.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کاملiDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text
Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontologybased information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relev...
متن کاملAnswering Structured Queries on Unstructured Data
There is growing number of applications that require access to both structured and unstructured data. Such collections of data have been referred to as dataspaces, and Dataspace Support Platforms (DSSPs) were proposed to offer several services over dataspaces, including search and query, source discovery and categorization, indexing and some forms of recovery. One of the key services of a DSSP ...
متن کاملA Novel Method for Selecting the Supplier Based on Association Rule Mining
One of important problems in supply chains management is supplier selection. In a company, there are massive data from various departments so that extracting knowledge from the company’s data is too complicated. Many researchers have solved this problem by some methods like fuzzy set theory, goal programming, multi objective programming, the liner programming, mixed integer programming, analyti...
متن کامل